A New Integrated Open-source Morphological Analyzer for Hungarian
نویسندگان
چکیده
The goal of a Hungarian research project has been to create an integrated Hungarian natural language processing framework. This infrastructure includes tools for analyzing Hungarian texts, integrated into a standardized environment. The morphological analyzer is one of the core components of the framework. The goal of this paper is to describe a fast and customizable morphological analyzer and its development framework, which synthesizes and further enriches the morphological knowledge implemented in previous tools existing for Hungarian. In addition, we present the method we applied to add semantic knowledge to the lexical database of the morphology. The method utilizes neural word embedding models and morphological and shallow syntactic knowledge.
منابع مشابه
Using a morphological analyzer in high precision POS tagging of Hungarian
The paper presents an evaluation of maxent POS disambiguation systems that incorporate an open source morphological analyzer to constrain the probabilistic models. The experiments show that the best proposed architecture, which is the first application of the maximum entropy framework in a Hungarian NLP task, outperforms comparable state of the art tagging methods and is able to handle out of v...
متن کاملAutomatic Diacritics Restoration for Hungarian
In this paper, we describe a method based on statistical machine translation (SMT) that is able to restore accents in Hungarian texts with high accuracy. Due to the agglutination in Hungarian, there are always plenty of word forms unknown to a system trained on a fixed vocabulary. In order to be able to handle such words, we integrated a morphological analyzer into the system that can suggest a...
متن کاملAutomatic morphological analysis of learner Hungarian
In this paper, we describe a morphological analyzer for learner Hungarian, built upon limited grammatical knowledge of Hungarian. The rule-based analyzer requires very few resources and is flexible enough to do both morphological analysis and error detection, in addition to some unknown word handling. As this is work-in-progress, we demonstrate its current capabilities, some areas where analysi...
متن کاملMorphological Analyzer and Generator for Russian and Ukrainian Languages
pymorphy2 is a morphological analyzer and generator for Russian and Ukrainian languages. It uses large efficiently encoded lexicons built from OpenCorpora and LanguageTool data. A set of linguistically motivated rules is developed to enable morphological analysis and generation of out-of-vocabulary words observed in real-world documents. For Russian pymorphy2 provides state-of-the-arts morpholo...
متن کاملLeveraging the open source ispell codebase for minority language analysis
The ispell family of spellcheckers is perhaps the single most widely ported and deployed open-source language tool. Here we describe how the SzóSzablya ‘WordSword’ project leverages ispell’s Hungarian descendant, HunSpell, to create a whole set of related tools that tackle a wide range of low-level NLP-related tasks such as character set normalization, language detection, spellchecking, stemmin...
متن کامل